Skip to content

v2.3 bundle: CKB structural parity + impact-query fixes (v2.3.0 → v2.3.5)#24

Merged
SimplyLiz merged 18 commits intomainfrom
v2.3.0
Apr 25, 2026
Merged

v2.3 bundle: CKB structural parity + impact-query fixes (v2.3.0 → v2.3.5)#24
SimplyLiz merged 18 commits intomainfrom
v2.3.0

Conversation

@SimplyLiz
Copy link
Copy Markdown
Contributor

@SimplyLiz SimplyLiz commented Apr 24, 2026

Summary

v2.3 release bundle — 9 commits on top of main, spanning the CKB structural-parity work (v2.3.0) through the forward-direction name-bridge fix (v2.3.5).

v2.3.0 — CKB structural-parity bundle. Rich symbol metadata (signature_normalized, modifiers, visibility, container_name, extraction_tier, modifiers_source); reference classification (ReferenceKind + is_test on every occurrence); QueryBlastRadiusSymbol, QueryOutgoingCalls; ranked + filtered QueryWorkspaceSymbols.

v2.3.1 — CKB import landing. RegisterProjectRoot + daemon-side canonical URI resolution; EdgesSource provenance on blast radius; tier-1 edge back-fill when SCIP imports carry none; lip import --verify; self-echo deadlock fix on bulk precomputed imports.

v2.3.2 — CKB testdrive follow-up. edges_source moved onto BlastRadiusResult; tier-1 back-fill URIs translate to SCIP descriptor form (same-file + cross-file); path-traversal guard on SCIP ingestion; callee_name_to_callers normalisation; Phase-3 blank-symbol_uri fallback.

v2.3.3 — QueryOutgoingImpact. Forward-direction twin of QueryBlastRadiusSymbol with the same EnrichedOutgoingImpact envelope, edges_source, and semantic enrichment via SemanticImpactItem { source: Static | Semantic | Both }.

v2.3.4 — module_id on impact items. ImpactItem.module_id + SemanticImpactItem.module_id resolved once at upsert time (slice URI / SCIP package / manifest walk). Unlocks CKB's RecomputeBlastRadius.ModuleCount for non-sliced LIP-only traffic.

v2.3.5 — Forward-direction name-bridge symmetry. New caller_name_to_callees index mirrors v2.3.2's callee_name_to_callers at every edge-insertion site; outgoing_impact_for BFS now consults both URI-exact and name-bridge indexes on every hop. Closes the asymmetry where QueryOutgoingImpact seeded from a SCIP descriptor URI returned empty direct_items for name-overloaded methods.

protocol_version stays at 2; every new field is #[serde(default, skip_serializing_if = …)]; every new message is advertised in HandshakeResult.supported_messages.

Test plan

  • cargo test --lib — 438/438 passing
  • cargo check — clean, no warnings
  • Regression tests per subsystem: blast_radius_phase3_fallback_for_tier1_caller_uri, outgoing_impact_phase3_fallback_for_tier1_callee_uri, outgoing_impact_name_bridge_for_tier1_caller_uri, normalize_callee_name_strips_scip_descriptor_suffixes, daemon_bulk_precomputed_import_does_not_deadlock, blast_radius_surfaces_module_id_from_scip_descriptor, blast_radius_surfaces_module_id_from_cargo_toml_walk, outgoing_impact_surfaces_module_id, edges_source_survives_all_response_envelopes, tier1_backfill_translates_caller_uri_to_scip_fragment, tier1_backfill_resolves_cross_file_callee_via_name_index
  • CKB testdrive against a real Go repo with name-overloaded methods — confirm QueryOutgoingImpact returns non-empty direct_items (the v2.3.5 scenario)
  • CKB integration sign-off on module_id surfacing in RecomputeBlastRadius.ModuleCount

🤖 Generated with Claude Code

SimplyLiz and others added 18 commits April 16, 2026 09:12
… and embeddings

The SCIP import path pushed pre-computed symbols/occurrences/edges via
Delta but the daemon silently dropped them in multiple places:

- Journal wrote empty text → symbols lost on daemon restart
- upsert_file_precomputed ignored CPG edges → blast-radius broken
- stale_files hashed empty text → infinite Merkle re-sync loop
- file_source_text returned "" → embeddings, stream_context, and
  explain-match all failed for imported files

Fix: FileInput now carries a precomputed flag and content_hash.
JournalEntry::UpsertFilePrecomputed persists symbols, occurrences,
and edges so they survive compact + replay. stale_files uses the
stored content_hash. file_source_text falls back to disk for
precomputed file:// URIs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- session.rs: explain why Tier 2 verification is skipped for pre-computed
  SCIP imports (source_opt is None by design, SCIP emitters are
  authoritative)
- export.rs: document that SCIP round-trips lose CPG edges since the
  SCIP wire format has no edge representation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on, CI workflow

Five improvements in one batch:

1. SCIP integration test — end-to-end test proving pre-computed symbols
   from Delta are searchable via WorkspaceSymbols and resolvable via
   QueryDefinition (regression coverage for the import path fix)

2. Proto fix — Relationship.is_override → is_definition to match
   upstream SCIP field 5 semantics; export mapping updated accordingly

3. SCIP CI action — reusable GitHub Actions workflow
   (.github/workflows/scip-import.yml) that runs a SCIP indexer
   (rust/typescript/python), starts a LIP daemon, and pushes the
   index at confidence 100

4. Tier 2 test harness — 14 unit tests for the verification manager:
   routing dispatch, channel backpressure, confidence elevation,
   symbol upgrade merging, backend unavailability

5. Name-dep invalidation — new invalidated_files_for() query answering
   "which files break if these symbols change" using the existing
   file_consumed_names index; wired into the daemon protocol as
   QueryInvalidatedFiles / InvalidatedFilesResult

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New protocol message that computes blast radius for all symbols defined
in the given changed files in one call. When min_score is present,
each file's embedding is compared against the index and neighbours
above the threshold are returned as semantic_items with a source tier
(static / semantic / both).

Designed for CKB's BlastRadiusEnricher: one round-trip prefetch in
reviewPR, static callers stay authoritative for thresholds, semantic
callers are advisory with per-item confidence.

Wire format:
  → query_blast_radius_batch { changed_file_uris, min_score? }
  ← blast_radius_batch_result { results: [EnrichedBlastRadius] }

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…changelog

- LIP_SPEC.mdx §8.1.1: batch blast radius with semantic enrichment,
  symbol kind filtering rationale, embedding scope note (file-level
  today, per-function when chunked embeddings land)
- daemon.mdx: add QueryBlastRadiusBatch to protocol message table
- CHANGELOG.md: document all unreleased changes (SCIP fixes, journal
  persistence, name-dep invalidation, blast radius batch)
- db.rs: filter blast_radius_batch to Function/Method/Class/Interface/
  Constructor/Macro kinds; add embedding scope comment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…yAbiHash, Tier1.5, backoff

Tier 1:
- NearestItem.embedding_model: per-hit model provenance on all nearest-neighbour results
- blast_radius_batch: symbol-level semantic enrichment when symbol_embeddings available;
  SemanticImpactItem.symbol_uri now non-empty at function granularity, falls back to file-level
- ReindexStale { uris, max_age_seconds } → ReindexStaleResult { reindexed, skipped }:
  atomic check-then-reindex replacing the QueryFileStatus → ReindexFiles race

Tier 2:
- BatchFileStatus { uris } → BatchFileStatusResult { entries: Vec<FileStatusEntry> }:
  multi-file status in one round-trip, batchable
- Tier 2 backoff recovery: all 8 LSP backends recover from crashes with exponential backoff
  (2–300s); permanently disabled only after 8 consecutive failures (BackoffState struct)

Tier 3:
- QueryAbiHash { uri } → AbiHashResult { uri, hash }: SHA-256 over exported symbol surface,
  stable recompilation trigger (batchable); Kotlin IC-style ABI fingerprinting
- LipDatabase::run_tier1_5_inference(): Datalog fixed-point loop — callee elevation when all
  callers ≥ 80 confidence, exported-leaf +5 bump; ceiling 65 (Tier 1.5 level)

All new variants wired into variant_tag, supported_messages, is_batchable, and the
BatchQuery sync handler. 313 unit tests + 14 integration tests green, clippy clean, fmt clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
EnrichedBlastRadius gains file_uri so callers can trace results back to
their input file. BlastRadiusBatchResult gains not_indexed_uris
(skip_serializing_if empty, back-compat) to distinguish "URI not in
index" from "URI has zero callers" — previously both were silent empty.
blast_radius_batch now checks file_inputs before calling file_symbols.

file_symbols guards against the cold-cache path for precomputed files:
if sym_cache is cold and file is marked precomputed, return [] instead
of falling through to Tier 1 parsing on empty text.

Three new tests: blast_radius_batch_not_indexed_uris_reported,
blast_radius_batch_file_uri_populated,
file_symbols_precomputed_cold_cache_returns_empty.

Docs: v2.1 + v2.2 roadmap sections added to spec.mdx and LIP_SPEC.mdx;
ReindexStale/BatchFileStatus/QueryAbiHash added to daemon message table.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ships 5 additive features so CKB can retire its duplicate SCIP parser.
Protocol version stays at 2; every new field uses serde defaults +
skip_serializing_if. Drift-guard test covers supported_messages and
variant_tag for the two new client messages.

#1 Rich symbol metadata — signature_normalized, modifiers, visibility
   + visibility_confidence, container_name, extraction_tier,
   modifiers_source on OwnedSymbolInfo. Tier-1 populates the structural
   fields; SCIP importer derives modifiers via prefix-parse and uses
   upstream-compatible enclosing_symbol=8.

#2 Reference classification — ReferenceKind (Unknown/Call/Read/Write/
   Type/Implements/Extends) + is_test on OwnedOccurrence. Tier-1
   classifier uses tree-sitter parent/field lookup; SCIP import/export
   maps to SymbolRole::Read/WriteAccess and Test bits.

#3 QueryBlastRadiusSymbol — single-symbol wrapper around
   blast_radius_for_symbol with semantic enrichment; returns None for
   unknown or unindexed symbols.

#4 QueryOutgoingCalls — forward call-graph BFS. New caller_to_callees
   index mirrors the reverse map, populated in upsert paths and cleaned
   in remove_file_call_edges. Depth clamped [1,8]; NODE_LIMIT=200 with
   truncated flag.

#5 Ranked workspace symbols — kind_filter, scope, modifier_filter on
   QueryWorkspaceSymbols; WorkspaceSymbolsResult gains ranked:
   Vec<RankedSymbol> with tiered scoring (Exact=1.0 / Prefix=0.8 /
   Fuzzy=0.5). Empty query preserves pre-v2.3 behavior (ranked=[]).

21 integration tests green; new coverage for every feature.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Converge client and daemon URI conventions and back-fill call edges when
SCIP imports omit them. Fixes the "lip import prints success but every
file shows indexed: false" bug reported against v2.3.0.

- RegisterProjectRoot message + daemon canonicalizes lip://local/<rel>
  against registered roots (longest-first); capability advertised in
  HandshakeResult.supported_messages
- EdgesSource provenance on EnrichedBlastRadius (Tier1 |
  ScipWithTier1Edges | ScipOnly | Empty) so CKB can route around files
  LIP has no structural edges for
- upsert_file_precomputed reads the file from disk and runs tier-1 when
  the incoming SCIP document has empty edges
- lip import emits canonical lip://local//<abs>/<rel> (or
  lip://local/<rel> when Metadata.project_root is absent), replacing the
  old file:///<rel> form that silently mismatched CKB queries
- lip import --verify round-trips up to 10 sampled documents after push
  and exits non-zero on any mismatch

Bumps workspace version 2.2.0 → 2.3.1; v2.3.0 features and v2.3.1 fixes
ship in the same release.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Every session subscribes to the daemon's push-notification broadcast and
writes pending notifications back to its client after each response. Every
Delta{Upsert} also emitted an IndexChanged onto that same broadcast — so
the session wrote TWO frames per delta (DeltaAck + self-emitted
IndexChanged) while `lip import` read ONE frame per iteration. Frame
production ran one frame ahead of consumption; after ~65 deltas the 8 KB
macOS AF_UNIX send buffer filled, write_message parked mid-frame,
read_message never ran, both processes idle at 0% CPU.

Fix: tag every broadcast message with the emitting session's id
(Notification { source_session: Option<u64>, message: ServerMessage }) and
have the drain loop skip envelopes whose source_session matches its own.
Tier 2 upgrades emit with source_session=None so they still reach every
session. LipDaemon holds an AtomicU64 and assigns a fresh id per accept.

Regression test daemon_bulk_precomputed_import_does_not_deadlock pushes
200 precomputed deltas through a single session and fails fast if any
IndexChanged echo reaches the client. Verified: test fails at delta 1
without the filter, passes in 60ms with it.

Latent since v2.2.0 when IndexChanged-on-every-upsert landed; surfaced
only now because the v2.3.1 URI fix let CKB imports run long enough to
hit the 8 KB buffer wall (836-doc SCIP bundle froze at ~130).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… + path-traversal guard)

Five correctness fixes discovered after v2.3.1 shipped and CKB began
consuming EnrichedBlastRadius end-to-end. Wire-compatible via
#[serde(flatten)] on BlastRadiusResult; protocol_version stays at 2.

Changed:
- edges_source moved from EnrichedBlastRadius onto BlastRadiusResult
  so non-enriched QueryBlastRadius carries call-edge provenance too;
  JSON shape unchanged.

Fixed:
- Tier-1 back-fill URIs now translate to SCIP descriptor form, both
  same-file (via display_name map) and cross-file (via name_to_symbols
  index with single-match guard). Symbol URIs no longer blank on wire
  responses for CKB dedup.
- Path-traversal guard in convert_document rejects SCIP documents
  whose relative_path escapes the project root under string-level
  normalization — stops Go build-cache artefacts leaking into the
  graph.
- Double lip://local/ prefix in callee_to_callers keys: lip_uri now
  detects an existing lip://local/ prefix when the back-fill replays
  tree-sitter against a canonical-URI file.
- SCIP-descriptor vs tier-1-identifier mismatch in callee_name_to_callers:
  new normalize_callee_name(fragment) strips trailing () / . / : / #
  at all four insert sites plus the BFS lookup, so SCIP and tier-1
  callees share keys.

Added:
- LIP_DEBUG_EDGES=1 diagnostic gating for upsert_file_precomputed,
  Phase-2 BFS, and the wire serializer. Wire log reports
  has_edges_source / body_bytes / 500-char head — truncation-free.

Tests: +normalize_callee_name_strips_scip_descriptor_suffixes,
+edges_source_survives_all_response_envelopes,
+tier1_backfill_translates_caller_uri_to_scip_fragment,
+tier1_backfill_resolves_cross_file_callee_via_name_index.
409 unit + 26 integration + 44 lip-cli all green.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…iagnostic

Bug D (CKB testdrive follow-up): when the tier-1 back-fill resolver's
`translate` and `name_to_symbols` indexes both miss for a caller name,
the back-fill preserves the raw tier-1 URI (`lip://local//<abs>#<name>`)
as the caller in `callee_to_callers`. `def_index` was never populated
for that URI — only SCIP occurrences register there — so Phase 3 of
`blast_radius_for` skipped every such caller and Phase 4 emitted 100%
blank `symbol_uri` in the CKB testdrive.

Phase 3 now falls back to deriving the file URI by stripping the
`#<name>` fragment when `def_index` misses and the caller URI carries
the `lip://local/` scheme, using the caller URI verbatim as
`symbol_uri`. No double-indexing required. Regression test imports a
caller file with no SCIP symbols against an on-disk source so the
resolver must miss, then asserts the ImpactItem carries the full
tier-1 caller URI rather than a blank.

Also split the LIP_DEBUG_EDGES `upsert_precomputed` log into
`scip_pairs` / `tier1_pairs` — the previous `pairs=N` total was
ambiguous between "N from SCIP" (→ ScipOnly) and "0 from SCIP, N from
back-fill" (→ ScipWithTier1Edges), masking upstream SCIP producer
drift as LIP regression.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…astRadiusSymbol)

Additive RPC so CKB can query the forward call-graph direction with the same
enriched envelope and edges_source provenance gating as blast radius. BFS over
caller_to_callees, depth clamped 1..=8, NODE_LIMIT=200. Symmetric Bug-D-style
#<name>-strip fallback on the callee side. Semantic enrichment via
SemanticImpactItem with Static/Semantic/Both tagging (symbol embedding
preferred, file embedding fallback). edges_source lives on OutgoingImpactStatic
so CKB can apply the same EdgesSourceEmpty → skip fold gate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three-tier resolution (slice URI prefix → SCIP package descriptor →
language-appropriate manifest walk), resolved once at upsert time and
stored on FileInput. Surfaces on every ImpactItem and SemanticImpactItem
built by blast_radius_for / blast_radius_for_symbol / blast_radius_batch
and outgoing_impact_for so CKB's cross-module risk classifier gets a
useful grouping key instead of collapsing to ModuleCount=0.

Manifest coverage: Cargo.toml, go.mod, package.json, pyproject.toml,
setup.py, pubspec.yaml. Unsupported languages (C/C++/Kotlin/Swift/Java)
return None. Field is #[serde(default, skip_serializing_if = None)], so
the wire shape stays byte-identical for emitters that don't populate it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Forward-direction twin of v2.3.2's callee_name_to_callers. Index keyed
by normalize_callee_name(extract_name(from_uri)); populated at all three
edge-insertion sites (regular tier-1 upsert, SCIP pre-computed edges,
SCIP-empty tier-1 back-fill) and pruned in remove_file_call_edges.
outgoing_impact_for's BFS now consults both caller_to_callees
(URI-exact) and caller_name_to_callees (name-bridge) on every hop,
matching Phase 2 of blast_radius_for.

Closes the asymmetry where QueryOutgoingImpact seeded from a SCIP
descriptor URI (e.g. pkg#Engine#AnalyzeImpact().) returned empty
direct_items because the tier-1 back-fill had kept the raw tier-1
caller URI when the method name was ambiguous across the codebase
(translate-map miss + name_to_symbols multi-hit fallthrough).

Regression test outgoing_impact_name_bridge_for_tier1_caller_uri.
All 438 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pure CI hygiene — no behavioural change. Rust 1.95 added several
clippy lints that trip on pre-existing idioms in the v2.3 codebase:

- unnecessary_map_or:      is_some_and over map_or(false, …)
- unnecessary_sort_by:     sort_by_key + std::cmp::Reverse
- manual_pattern_char_cmp: .find(['@', '/']) over closure
- cloned_ref_to_slice_refs: std::slice::from_ref for single-element slices

Plus cargo fmt across 19 files to align with current rustfmt output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@SimplyLiz
Copy link
Copy Markdown
Contributor Author

Superseded ny #24

@SimplyLiz SimplyLiz merged commit 9bd5bc9 into main Apr 25, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant